On String Matching with Mismatches

نویسندگان

  • Marius Nicolae
  • Sanguthevar Rajasekaran
چکیده

In this paper, we consider several variants of the pattern matching with mismatches problem. In particular, given a text T = t1t2 · · · tn and a pattern P = p1p2 · · · pm, we investigate the following problems: (1) pattern matching with mismatches: for every i, 1 ≤ i ≤ n −m + 1 output, the distance between P and titi+1 · · · ti+m−1; and (2) pattern matching with k mismatches: output those positions i where the distance between P and titi+1 · · · ti+m−1 is less than a given threshold k. The distance metric used is the Hamming distance. We present some novel algorithms and techniques for solving these problems. We offer deterministic, randomized and approximation algorithms. We consider variants of these problems where there could be wild cards in either the text or the pattern or both. We also present an experimental evaluation of these algorithms. The source code is available at http://www.engr.uconn.edu/∼man09004/kmis.zip.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reduced Nondeterministic Finite Automata for Approximate String Matching

We will show how to reduce the number of states of nondeterministic nite automata for approximate string matching with k mismatches and nondeterministic nite automata for approximate string matching with k differences in the case when we do not need to know how many mismatches or di erences are in the found string. Also we will show impact of this reduction on Shift-Or based algorithms.

متن کامل

A Parallel Algorithm for Fixed-Length Approximate String-Matching with k-mismatches

This paper deals with the approximate string-matching problem with Hamming distance. The approximate string-matching with kmismatches problem is to find all locations at which a query of length m matches a factor of a text of length n with k or fewer mismatches. The approximate string-matching algorithms have both pleasing theoretical features, as well as direct applications, especially in comp...

متن کامل

On string matching with k mismatches

In this paper we consider several variants of the pattern matching problem. In particular, we investigate the following problems: 1) Pattern matching with k mismatches; 2) Approximate counting of mismatches; and 3) Pattern matching with mismatches. The distance metric used is the Hamming distance. We present some novel algorithms and techniques for solving these problems. Both deterministic and...

متن کامل

Approximate String Matching by Finite Automata

Abs t r ac t . Approximate string matching is a sequential problem and therefore it is possible to solve it using finite automata. A nondeterministic finite automaton is constructed for string matching with k mismatches. It is shown, how "dynamic programming" and "shift-and" based algorithms simulate this nondeterministic finite automaton. The corresponding deterministic finite automaton have O...

متن کامل

String Matching with Mismatches by Real-Valued FFT

String matching with mismatches is a basic concept of information retrieval with some kinds of approximation. This paper proposes an FFT-based algorithm for the problem of string matching with mismatches, which computes an estimate with accuracy. The algorithm consists of FFT computations for binary vectors which can be computed faster than the computation for vectors of complex numbers. Theref...

متن کامل

Fast String Matching with Mismatches

We describe and analyze three simple and fast algorithms on the average for solving the problem of string matching with a bounded number of mismatches. These are the naive algorithm, an algorithm based on the Boyer-Moore approach, and ad-hoc deterministic nite automata searching. We include simulation results that compare these algorithms to previous works.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Algorithms

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2015